How do two completely different musical moods take shape in data? This dashboard presents a comparative analysis of two AI-generated tracks I created using Stableaudio AI. One track explores ambient minimalism, while the other is a high-energy breakbeat.
Track 1: Meditative Ambient
Soundscape
- Tags: Ambient, Post-Rock, Minimalist, Reverb, 60 BPM
Track 2: Energetic Breakbeat Rave
- Tags: Breakbeat, Acid Breaks, Chaotic, 135 BPM
As an Information Science student, both the world of computational music analysis and music production were fairly new territories to me at the start of this project. While I was familiar and comfortable with working with data, exploring music through features like tempo, valence, or timbre was entirely new to me. Prior to this project I had only limited knowledge about music theory, although I have always had a casual interest in music at its broadest. As for the computational aspects I did have a bit of RStudio knowledge, though I was used to working with different types of data and visualizations than those explored in this course, so even this was novel for me.
Starting off, I experimented with multiple AI music generators, but many of them felt too rigid or unpredictable. Eventually, I settled on Stableaudio AI for its detailed prompt control and more satisfying sonic results. My goal was to create two starkly contrasting tracks: one calm and ambient, the other chaotic and energetic, so that I could explore the musical and data-level differences more clearly across various analytical features.
If I had more time or production experience, I would have loved to compose or produce my own music from scratch. However, given the time constraints and learning curve using AI allowed me to focus more on the analytical side of this project so that the AI tracks could be used as case studies to compare both with each other and with the class corpus.
Ultimately, my goal during this project was not just the generation of music, but rather to develop a deeper understanding of how music by exploring how differences between two songs become visible through data.
For this project, I generated two original pieces of music using Stableaudio AI (Stableaudio). My goal was to create two strongly contrasting tracks—one meditative and slow-paced, the other aggressive and heavier in terms of rhythm. This contrast helped me make more insightful comparisons when analyzing musical features across mood, energy, and structure.
To shape each track, I wrote detailed prompts inspired by genre tags from RateYourMusic.com, including emotional and instrumental descriptors.
Track 1: Meditative Ambient Soundscape
Style: Ambient, Post-Rock, Cinematic
Length: 2 minutes
Goal: A calm, meditative ambient with minimal
instrumentation.
Tags Used:
Ambient, Post-Rock, Cinematic, Ethereal, Soothing, Meditative,
Minimalist, Warm Subtle Bass, Deep Drones, Airy Pads, Textures, Analog
Synths, Field Recordings, Wind Sounds, Reverb, 60
BPM
Track 2: Energetic Breakbeat Rave
Style: Breakbeat, Acid Breaks, 90s Rave
Length: 2 minutes
Goal: A high-energy, chaotic breakbeat track.
Tags Used:
Breakbeat, Acid Breaks, 90s Rave, Energetic, Raw, Funky, Chaotic,
Breakbeats, Deep Bass, Distorted 808, Acid Bass, Filtered Chords,
Reversed Pads, Vocal Chops, 135 BPM
Creating the tracks involved three steps:
Prompt Design:
I began by exploring genre tags on RateYourMusic, collecting descriptors
that captured the mood, instrumentation, and texture I wanted. I made
sure to include detailed keywords around tempo, style, and sound design
to give Stableaudio clear guidance.
Generation:
I then entered these prompts into Stableaudio AI to generate multiple
versions of each track. It took some trial and error to get the sonic
qualities I envisioned, especially for the ambient piece, where subtle
dynamic and layered space were essential.
Finalization:
Once I had promising outputs, I evaluated them critically and overall
judged the quality of several of my prompt outcomes. I selected the
strongest versions, exported them as MP3s, and prepared them for
analysis in this dashboard.
With both tracks finalized the next step was to dive into the data, starting with an exploration of its features.
To begin my analysis, I wanted a straightforward visualisation that would let me place my own tracks within the context of the class corpus. I chose a scatterplot because it clearly separates each data point, which helped me to quickly identify how my tracks compare in terms of danceability and tempo, as these two features often shape the overall feel of a track.
This interactive scatterplot maps Danceability against Tempo of the tracks, which helps to position my two AI-generated tracks within the entire class corpus.
NOTE Interestingly enough, this scatterplot has changed quite a bit from the first time I had generated it: before this, my ambient track (red) was positioned as both slow and undanceable, while the breakbeat track was slightly higher in both tempo and danceability. For some reason I cannot figure out, this changed sometime during the last few weeks. Perhaps there was an adjustment to the class corpus features, and it impacted the estimated tempo of those two tracks. It now states that my ambient tracks sits at over 150 BPM, while I generated it to be much lower than that, and certainly not faster than the breakbeat track. So, my analysis below still reflects this interpretation, as this was actually rooted in reality.
There appears to be no set correlation between the danceability and tempo of the tracks. However, an interesting pattern emerges: there are two clusters, one with low danceability, and another with high danceability, while the tempo does not differ much.
Regarding my own tracks:
One particularly surprising observation is how the AI interpreted the second song’s tempo. While I set it to 135 BPM, it was classified as 93 BPM. This suggests that the AI might have emphasized a different rhythmic structure or half-time feel in its classification.
EDIT As stated above, here is another case of the estimated tempo shifting, but this time for the breakbeat track. Thus, tempo does not appear to be a very consistently evaluated variable. Perhaps each method counts their notes differently.
Overall, the plot allowed me to see individual track positions and emerging clusters. As visible here, there’s a group of tracks with low danceability and another with high danceability, while tempo remains relatively consistent across both clusters. This helped me gain more perspective into where my own tracks ‘live’ compared to the others, which created a solid mental foundation for further explorations.
Before diving into the analysis, it’s helpful to briefly explain what a chromagram is. A chromagram is a visual representation of harmonic content in music. It maps audio to the 12 pitch classes (C, C#, D, …, B), showing how energy is distributed across pitches over time, regardless of octave. This is especially useful when analyzing chord progressions, tonality, or harmonic density.
Creating these chromagrams proved to be a bit of a journey for me. I initially struggled because I missed the session where we were introduced to the compmus.R helper file. As I did not know it existed, I tried calculating chromagrams (and later, self-similarity matrices) using external packages like tuneR and seewave.
Eventually, I did manage to get my plots working, though it took a lot of trial and error. Even after discovering the compmus.R file, I ran into some persistent bugs, possibly due to updates to the helper functions or conflicts in how I had structured my analysis. For now, I’ve kept my original chromagram approach intact, since it renders reliably in R.
One thing I did notice was how the resulting spectrograms could benefit from better log-frequency mapping. Without it, the lower frequencies, especially in my ambient track, appear faint or almost non-existent. This is a fair reflection though, as the track is meant to be minimal and quiet.
Below are screenshots from my first attempt at generating chromagrams
using waveform-based spectrograms via tuneR and
seewave.
While they visually resembled spectrograms, they did not accurately
reflect pitch-class content and failed to render properly within the
RMarkdown dashboard.
This took quite a lot of time to display this graphic, you may set 'fastdisp=TRUE' for a faster, but less accurate, display
Track 1 — Ambient
Energy is concentrated in the lower frequency range (below ~5
kHz)
The overall amplitude is low, reflecting the sparse, ambient
atmosphere
The structure is fluid and minimal, with little rhythmic emphasis
Track 2 — Breakbeat
Frequency content spans a much wider range, reaching above 10
kHz
Brighter colors indicate greater amplitude and more harmonic
variation
Suggests a more rhythmic, melodic, and dynamic structure, consistent
with breakbeat styles
So, although the chromagram for the ambient track appears faint and minimal, this is true to the source material which was designed to be slow, textured, and atmospheric. In contrast, the breakbeat track is vibrant and information-dense, with sharper contrasts and more visible harmonic shifts.
I later discovered the compmus.R helper file includes a
purpose-built compmus_chroma function, so I calculated them
again in the correct way.
Track 1 — Evan-l-1.wav (Ambient):
There is a clear strong harmonic emphasis on F#, D#, and C#, with F# especially dominant.F# remains smoothly present and stable throughout most of the likely making it the tonic, thus track 1 can be described as a tonally consistent piece with little modulation. This aligns with the overall genre conventions of ambient tracks, where strong tonal consistenty with slow shifts play an important role in creating the wanted atmosphere.
Track 2 — Evan-l-2.wav (Breakbeat):
The harmonic content appears much more scattered over multiple pitch classes, with some light bursts of color in D, D#, and G. The texture appears relatively unstable, which aligns with the more complex and evolving harmonic structure the song has. This also reflects the genre conventions of breakbeat, which usually have rapid tonal shifts.
After exploring harmonic content through chromagrams, I turned to self-similarity matrices (SSMs) to better understand the internal structure of my two tracks.
SSMs compare every moment in a song with every other moment, based on chosen features like chroma or timbre. This creates a square matrix where repeated or similar sections appear as distinct patterns—such as diagonal lines (repetitions), blocks (sections), or checkerboards (alternating segments). They’re very useful for spotting structural repetition, dynamic contrast, and transitions that might not be immediately audible.
Within the context of my musical exploration these SSMs helped me explore how much variety or repetition exists in my tracks—and whether those patterns aligned with what I intended in the composition prompts. As with the chromagrams, I initially used external packages to generate my self-similarity matrices (tuneR, seewave, and melfcc() from phonTools). Luckily, the output remains very clear and useful, even if slightly stylistically different from the class standard.
Despite being AI-generated, both tracks exhibit recognizable structural patterns that align with their intended genres. The ambient track relies on repetition and slow shifts, while the breakbeat track emphasizes variety and rhythmic segmentation. These SSMs were one of the clearest visual confirmations of that difference.
Chordograms/Keygrams offer a great way to visualize how harmonic templates such as keys align with the audio over time. They are derived from chroma features, but rather than showing raw pitch-class energy, they show how closely the signal matches known musical templates like major/minor chords.
For this step, I finally managed to use the class’s compmus.R functions after struggling initially with the wrong feature format. Now that I had access to my .json chroma features, I could apply the pitch matching function. I wanted to generate keygrams to better understand the tonal content of each track. Since both songs belong to abstract or electronic genres (ambient and breakbeat), it wasn’t obvious what key either was in just by listening. So, a visualisation of harmonic similarity across key templates offered an analytical alternative.
Conclusively, the ambient track displays strong harmonic minimalism, while the breakbeat track displays harmonic variability. It is interesting how both of these reflect their respective genre conventions, and how it would be easy to tell which track is which from the keygrams alone.
Before analyzing my own tracks’ tempo structures in detail, I first wanted to understand how tempo is distributed across the entire class corpus. My goal was to see where my own AI-generated tracks might fit within the broader spectrum of musical pacing in the dataset.
To do this I chose to visualize the tempo of each track using a histogram. A bar plot is one of the most straightforward ways to represent frequency distributions, and in this context, it clearly displays which tempos are most common in the class corpus. I also used Plotly to make the plot interactive, so individual tempo bins can be hovered over to reveal exact values.
From the histogram, we can see that the most frequent tempo (mode) in the class corpus is 85 BPM, with a count of 16 tracks. This relatively slow pace is commonly associated with genres like hip-hop, downtempo electronic, or ballads—genres that often center around groove or mood rather than rapid movement. It also aligns roughly with the human resting heart rate, which may contribute to its perceived ‘naturalness’.
The tempo distribution also shows smaller peaks in higher ranges, this could hint at clusters of tracks with faster or more danceable tempos like those found in house, pop, or techno.
After establishing the previous benchmark, I visualized the tempograms of my two AI-generated songs. These tempograms are generated from the JSON feature files and show how rhythmic intensity evolves over the duration of each track.
Comparing these tempograms visually strengthens the contrast in structure between my two tracks. The ambient piece is loosely timed, smooth, and much more textural, while the breakbeat track is rigid, fast-paced, and driven by its beat. This rhythmic contrast further supports what was established in earlier keygrams and SSMs, as each track is clearly distinct in both feel and temporal organization.
To finalize my feature exploration I wanted to visualize how different tracks group together based on their musical characteristics. Dendrograms, which visualize hierarchical clustering, are perfect for this kind of exploratory analysis: They show the relative similarity between tracks and form clusters where songs with similar musical features are grouped closer together.
I began by creating a dendrogram based on five track-level features: valence, arousal, danceability, tempo, and instrumentalness. This was mainly exploratory to give me a sense of how different the tracks in our class corpus really are when analyzed together. This general dendrogram already shows some distinct clustering patterns, although it’s based on a broader set of features.
Since my two AI-generated tracks are drastically different in energy and mood, I wanted to focus more directly on clustering the corpus based on the emotional and rhythmic character of each song. For this, I narrowed it down to just valence, arousal and tempo.
In this mood-based dendrogram, tracks that cluster lower tend to be more similar in emotional character. In this case, that often means relaxed, calm, or lo-fi — including my ambient track. My breakbeat track, on the other hand, clusters higher up, which indicates that it’s more distinct and intense in mood and energy.
To conclude, this dendrograms pattern aligns reasonably well with my own two tracks. My ambient piece is grouped low in the dendrogram, nestled among similarly relaxed tracks. Meanwhile, the breakbeat track is positioned much higher, reflecting its chaotic energy and rhythmic aggression. Seeing this contrast play out in the structure of the dendrogram was especially satisfying as it served as a visualization for something that is often only describable as a ‘vibe’ or ‘energy’, now being measurably present in the data.
| Feature | Ambient Track | Breakbeat Track |
|---|---|---|
| Tempo | 60 BPM | 135 BPM (detected as ~93 BPM) |
| Danceability | Low | High |
| Energy | Low | High |
| Valence | Neutral | Moderate |
| Key Stability | Very stable (F#) | Shifting (D#/E) |
| Structure | Minimal, ambient | Chaotic, rhythmic |
Coming into this project without a strong background in either computational analysis or musicology, I wasn’t entirely sure what to expect. My experience with R was minimal and limited to a single data science course two years ago, and my musical understanding was more intuitive than technical. But working through this analysis gave me a surprisingly rich perspective on how musical qualities manifest in data.
What stood out to me the most was seeing the contrast between my two AI-generated tracks unfold across each visualization. Because the tracks were intentionally so different (one ambient and meditative, the other energetic and chaotic) the distinctions were clear, which made it much easier to understand what each method was actually showing.
Whether looking at chromagrams, self-similarity matrices, or keygrams, I could begin to interpret the visuals in a way that matched what I had heard intuitively. Even moments like sensing a key shift without knowing the actual notes, only to see that shift appear in the analysis, helped recontextualize how I listen to music altogether. It felt like learning a new language which allowed me to describe and understand music in ways I couldn’t before.
This kind of process could be especially valuable for beginners in musicological analysis, particularly those exploring electronic instrumental music. The contrast between tracks, when made visual, becomes a powerful learning tool. Starting with minimal R knowledge, I found that the code gradually became more intuitive. Flexdashboard, ggplot2, and the compmus helper functions were all far more accessible than I expected once I got into the rhythm of things. My own analysis could serve as a way to better understand these features for other ‘compmus beginners’ as well. Being able to see harmony, rhythm, structure, and energy not only supports deeper listening, but also helps bridge the gap between abstract sound and analytical clarity.